Goto

Collaborating Authors

 multipartite ranking



When False Positive is Intolerant: End-to-End Optimization with Low FPR for Multipartite Ranking

Neural Information Processing Systems

Multipartite ranking is a basic task in machine learning, where the Area Under the receiver operating characteristics Curve (AUC) is generally applied as the evaluation metric. Despite that AUC reflects the overall performance of the model, it is inconsistent with the expected performance in some application scenarios, where only a low False Positive Rate (FPR) is meaningful. To leverage high performance under low FPRs, we consider an alternative metric for multipartite ranking evaluating the True Positive Rate (TPR) at a given FPR, denoted as TPR@FPR. Unfortunately, the key challenge of direct TPR@FPR optimization is two-fold: \textbf{a)} the original objective function is not differentiable, making gradient backpropagation impossible; \textbf{b)} the loss function could not be written as a sum of independent instance-wise terms, making mini-batch based optimization infeasible. To address these issues, we propose a novel framework on top of the deep learning framework named \textit{Cross-Batch Approximation for Multipartite Ranking (CBA-MR)}. In face of \textbf{a)}, we propose a differentiable surrogate optimization problem where the instances having a short-time effect on FPR are rendered with different weights based on the random walk hypothesis. To tackle \textbf{b)}, we propose a fast ranking estimation method, where the full-batch loss evaluation is replaced by a delayed update scheme with the help of an embedding cache. Finally, experimental results on four real-world benchmarks are provided to demonstrate the effectiveness of the proposed method.



When False Positive is Intolerant: End-to-End Optimization with Low FPR for Multipartite Ranking

Neural Information Processing Systems

Multipartite ranking is a basic task in machine learning, where the Area Under the receiver operating characteristics Curve (AUC) is generally applied as the evaluation metric. Despite that AUC reflects the overall performance of the model, it is inconsistent with the expected performance in some application scenarios, where only a low False Positive Rate (FPR) is meaningful. To leverage high performance under low FPRs, we consider an alternative metric for multipartite ranking evaluating the True Positive Rate (TPR) at a given FPR, denoted as TPR@FPR. Unfortunately, the key challenge of direct TPR@FPR optimization is two-fold: \textbf{a)} the original objective function is not differentiable, making gradient backpropagation impossible; \textbf{b)} the loss function could not be written as a sum of independent instance-wise terms, making mini-batch based optimization infeasible. To address these issues, we propose a novel framework on top of the deep learning framework named \textit{Cross-Batch Approximation for Multipartite Ranking (CBA-MR)}.


Ranking Data with Continuous Labels through Oriented Recursive Partitions

Clémençon, Stephan, Achab, Mastane

arXiv.org Machine Learning

We formulate a supervised learning problem, referred to as continuous ranking, where a continuous real-valued label Y is assigned to an observable r.v. X taking its values in a feature space $\mathcal{X}$ and the goal is to order all possible observations x in $\mathcal{X}$ by means of a scoring function $s:\mathcal{X}\rightarrow \mathbb{R}$ so that s(X) and Y tend to increase or decrease together with highest probability. This problem generalizes bi/multi-partite ranking to a certain extent and the task of finding optimal scoring functions s(x) can be naturally cast as optimization of a dedicated functional criterion, called the IROC curve here, or as maximization of the Kendall ${\tau}$ related to the pair (s(X), Y ). From the theoretical side, we describe the optimal elements of this problem and provide statistical guarantees for empirical Kendall ${\tau}$ maximization under appropriate conditions for the class of scoring function candidates. We also propose a recursive statistical learning algorithm tailored to empirical IROC curve optimization and producing a piecewise constant scoring function that is fully described by an oriented binary tree. Preliminary numerical experiments highlight the difference in nature between regression and continuous ranking and provide strong empirical evidence of the performance of empirical optimizers of the criteria proposed.


Ranking Data with Continuous Labels through Oriented Recursive Partitions

Clémençon, Stéphan, Achab, Mastane

Neural Information Processing Systems

We formulate a supervised learning problem, referred to as continuous ranking, where a continuous real-valued label Y is assigned to an observable r.v. X taking its values in a feature space X and the goal is to order all possible observations x in X by means of a scoring function s : X → R so that s(X) and Y tend to increase or decrease together with highest probability. This problem generalizes bi/multi-partite ranking to a certain extent and the task of finding optimal scoring functions s(x) can be naturally cast as optimization of a dedicated functional cri- terion, called the IROC curve here, or as maximization of the Kendall τ related to the pair (s(X), Y ). From the theoretical side, we describe the optimal elements of this problem and provide statistical guarantees for empirical Kendall τ maximiza- tion under appropriate conditions for the class of scoring function candidates. We also propose a recursive statistical learning algorithm tailored to empirical IROC curve optimization and producing a piecewise constant scoring function that is fully described by an oriented binary tree. Preliminary numerical experiments highlight the difference in nature between regression and continuous ranking and provide strong empirical evidence of the performance of empirical optimizers of the criteria proposed.